coagent network
Policy Gradient Coagent Networks
We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules. We present, analyze theoretically, and empirically evaluate an update rule for each module, which requires only local information: the module's input, output, and the TD error broadcast by a critic. Such updates are necessary when computation of compatible features becomes prohibitively difficult and are also desirable to increase the biological plausibility of reinforcement learning methods.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Coagent Networks: Generalized and Scaled
Kostas, James E., Jordan, Scott M., Chandak, Yash, Theocharous, Georgios, Gupta, Dhawal, White, Martha, da Silva, Bruno Castro, Thomas, Philip S.
Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpropagation-based deep learning (BDL) that overcomes some of backpropagation's main limitations. For example, coagent networks can compute different parts of the network \emph{asynchronously} (at different rates or at different times), can incorporate non-differentiable components that cannot be used with backpropagation, and can explore at levels higher than their action spaces (that is, they can be designed as hierarchical networks for exploration and/or temporal abstraction). However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches. This work generalizes the coagent theory and learning rules provided by previous works; this generalization provides more flexibility for network architecture design within the coagent framework. This work also studies one of the chief disadvantages of coagent networks: high variance updates for networks that have many coagents and do not use backpropagation. We show that a coagent algorithm with a policy network that does not use backpropagation can scale to a challenging RL domain with a high-dimensional state and action space (the MuJoCo Ant environment), learning reasonable (although not state-of-the-art) policies. These contributions motivate and provide a more general theoretical foundation for future work that studies coagent networks.
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Asia > Middle East > Jordan (0.04)
Every Hidden Unit Maximizing Output Weights Maximizes The Global Reward
For a network of stochastic units trained on a reinforcement learning task, one biologically plausible way of learning is to treat each unit as a reinforcement learning unit and train each unit by REINFORCE using the same global reward signal. In this case, only a global reward signal has to be broadcast to all units, and the learning rule given is local. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and cannot be used to train a deep network in practice. In this paper, we propose an algorithm called Weight Maximization, which can significantly improve the speed of applying REINFORCE to all units. Essentially, we replace the global reward to each hidden unit with the change in the norm of output weights, such that each hidden unit in the network is trying to maximize the norm of output weights instead of the global reward. We found that the new algorithm can solve simple reinforcement learning tasks significantly faster than the baseline model. We also prove that the resulting learning rule is approximately following gradient ascent on the reward in expectation when applied to a multi-layer network of Bernoulli logistic unit. It illustrates an example of intelligent behavior arising from a population of self-interested hedonistic neurons, which corresponds to Klopf's hedonistic neuron hypothesis.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- Asia > Middle East > Jordan (0.04)
Training spiking neural networks using reinforcement learning
Neurons in the brain communicate with each other through discrete action spikes as opposed to continuous signal transmission in artificial neural networks. Therefore, the traditional techniques for optimization of parameters in neural networks which rely on the assumption of differentiability of activation functions are no longer applicable to modeling the learning processes in the brain. In this project, we propose biologically-plausible alternatives to backpropagation to facilitate the training of spiking neural networks. We primarily focus on investigating the candidacy of reinforcement learning (RL) rules in solving the spatial and temporal credit assignment problems to enable decision-making in complex tasks. In one approach, we consider each neuron in a multi-layer neural network as an independent RL agent forming a different representation of the feature space while the network as a whole forms the representation of the complex policy to solve the task at hand. In other approach, we apply the reparameterization trick to enable differentiation through stochastic transformations in spiking neural networks. We compare and contrast the two approaches by applying them to traditional RL domains such as gridworld, cartpole and mountain car. Further we also suggest variations and enhancements to enable future research in this area.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Parameter Sharing in Coagent Networks
In this paper, we aim to prove the theorem that generalizes the Coagent Network Policy Gradient Theorem (Kostas et. al., 2019) to the context where parameters are shared among the function approximators involved. This provides the theoretical foundation to use any pattern of parameter sharing and leverage the freedom in the graph structure of the network to possibility exploit relational bias in a given task. As another application, we will apply our result to give a more intuitive proof for the Hierarchical Option Critic Policy Gradient Theorem, first shown in (Riemer et. al., 2019).
Policy Gradient Coagent Networks
We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules. We present, analyze theoretically, and empirically evaluate an update rule for each module, which requires only local information: the module's input, output, and the TD error broadcast by a critic. Such updates are necessary when computation of compatible features becomes prohibitively difficult and are also desirable to increase the biological plausibility of reinforcement learning methods.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)